We will use the NOAA data set.
Since the data set is quite huge with 1,417,635 entries, we remove all the NAs from precipitation, minimum temperature and maximum temperature and then take random 5000 samples from it by setting seed as 1.